Multilingual Speech Corpora for TTS System Development

نویسندگان

  • Hsi-Chun Hsiao
  • Hsiu-Min Yu
  • Yih-Ru Wang
  • Sin-Horng Chen
چکیده

In this paper, four speech corpora collected in the Speech Lab of NCTU in recent years are discussed. They include a Mandarin treebank speech corpus, a Min-Nan speech corpus, a Hakka speech corpus, and a Chinese-English mixed speech corpus. Currently, they are used separately to develop a corpus-based Mandarin TTS system, a Min-Nan TTS system, a Hakka TTS system, and a Chinese-English bilingual TTS system. These systems will be integrated in the future to construct a multilingual TTS system covering the four primary languages used in Taiwan.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A flexible multilingual TTS development and speech research tool

Diverse synthesis methods and text-to-speech (TTS) architectures are being developed and applied almost every day. This tendency raises the need for durable program systems that effectively assist research and development in this area. A flexible development system for multilingual textto-speech and general speech research is introduced. The system was developed for use with the Multivox and Pr...

متن کامل

Recent Advances in Multilingual Text-to-speech Synthesis

In this paper we will discuss recent advances in multilingual text-to-speech (TTS) synthesis research at AT&T Bell Laboratories. The TTS system developed at AT&T Bell Laboratories generates synthetic speech by concatenating segments of natural speech. The architecture of the system is designed as a modular pipeline where each module handles one particular step in the process of converting text ...

متن کامل

The Development of the Multilingual LUNA Corpus for Spoken Language System Porting

The development of annotated corpora is a critical process in the development of speech applications for multiple target languages. While the technology to develop a monolingual speech application has reached satisfactory results (in terms of performance and effort), porting an existing application from a source language to a target language is still a very expensive task. In this paper we addr...

متن کامل

Development of HMM-based Malay Text-to-Speech System

This paper presents the development of a hidden Markov model (HMM)-based Malay text-to-speech (TTS) system. To our knowledge, this is the first report on the development of the HMM-based speech synthesis system for the Malay language. In this paper, We first discuss the Malay speech characteristics, specifically, on Malay phonological system and syllable structure. In the Malay phonological sys...

متن کامل

Multilingual text analysis for text-to-speech synthesis

We present a model of text analysis for text-to-speech (TTS) synthesis based on (weighted) finite-state transducers, which serves as the text-analysis module of the multilingual Bell Labs TTS system. The transducers are constructed using a lexical toolkit that allows declarative descriptions of lexicons, morphological rules, numeral-expansion rules, and phonological rules, inter alia. To date, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006